Serveur sur les données et bibliothèques médicales au Maghreb (version finale)

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.

Identifieur interne : 000645 ( Main/Exploration ); précédent : 000644; suivant : 000646

SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.

Auteurs : Mourad Sarrouti [États-Unis] ; Said Ouatik El Alaoui [Maroc]

Source :

RBID : pubmed:31980104

Descripteurs français

English descriptors

Abstract

BACKGROUND AND OBJECTIVE

Question answering (QA), the identification of short accurate answers to users questions written in natural language expressions, is a longstanding issue widely studied over the last decades in the open-domain. However, it still remains a real challenge in the biomedical domain as the most of the existing systems support a limited amount of question and answer types as well as still require further efforts in order to improve their performance in terms of precision for the supported questions. Here, we present a semantic biomedical QA system named SemBioNLQA which has the ability to handle the kinds of yes/no, factoid, list, and summary natural language questions.

METHODS

This paper describes the system architecture and an evaluation of the developed end-to-end biomedical QA system named SemBioNLQA, which consists of question classification, document retrieval, passage retrieval and answer extraction modules. It takes natural language questions as input, and outputs both short precise answers and summaries as results. The SemBioNLQA system, dealing with four types of questions, is based on (1) handcrafted lexico-syntactic patterns and a machine learning algorithm for question classification, (2) PubMed search engine and UMLS similarity for document retrieval, (3) the BM25 model, stemmed words and UMLS concepts for passage retrieval, and (4) UMLS metathesaurus, BioPortal synonyms, sentiment analysis and term frequency metric for answer extraction.

RESULTS AND CONCLUSION

Compared with the current state-of-the-art biomedical QA systems, SemBioNLQA, a fully automated system, has the potential to deal with a large amount of question and answer types. SemBioNLQA retrieves quickly users' information needs by returning exact answers (e.g., "yes", "no", a biomedical entity name, etc.) and ideal answers (i.e., paragraph-sized summaries of relevant information) for yes/no, factoid and list questions, whereas it provides only the ideal answers for summary questions. Moreover, experimental evaluations performed on biomedical questions and answers provided by the BioASQ challenge especially in 2015, 2016 and 2017 (as part of our participation), show that SemBioNLQA achieves good performances compared with the most current state-of-the-art systems and allows a practical and competitive alternative to help information seekers find exact and ideal answers to their biomedical questions. The SemBioNLQA source code is publicly available at https://github.com/sarrouti/sembionlqa.


DOI: 10.1016/j.artmed.2019.101767
PubMed: 31980104


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.</title>
<author>
<name sortKey="Sarrouti, Mourad" sort="Sarrouti, Mourad" uniqKey="Sarrouti M" first="Mourad" last="Sarrouti">Mourad Sarrouti</name>
<affiliation wicri:level="2">
<nlm:affiliation>Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda, MD. Electronic address: sarrouti.mourad@gmail.com.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Maryland</region>
</placeName>
<wicri:cityArea>Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Ouatik El Alaoui, Said" sort="Ouatik El Alaoui, Said" uniqKey="Ouatik El Alaoui S" first="Said" last="Ouatik El Alaoui">Said Ouatik El Alaoui</name>
<affiliation wicri:level="1">
<nlm:affiliation>National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco; Laboratory of Informatics and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco.</nlm:affiliation>
<country xml:lang="fr">Maroc</country>
<wicri:regionArea>National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco; Laboratory of Informatics and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez</wicri:regionArea>
<wicri:noRegion>Fez</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2020">2020</date>
<idno type="RBID">pubmed:31980104</idno>
<idno type="pmid">31980104</idno>
<idno type="doi">10.1016/j.artmed.2019.101767</idno>
<idno type="wicri:Area/PubMed/Corpus">000174</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000174</idno>
<idno type="wicri:Area/PubMed/Curation">000173</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000173</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000093</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000093</idno>
<idno type="wicri:Area/Main/Merge">000645</idno>
<idno type="wicri:Area/Main/Curation">000645</idno>
<idno type="wicri:Area/Main/Exploration">000645</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.</title>
<author>
<name sortKey="Sarrouti, Mourad" sort="Sarrouti, Mourad" uniqKey="Sarrouti M" first="Mourad" last="Sarrouti">Mourad Sarrouti</name>
<affiliation wicri:level="2">
<nlm:affiliation>Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda, MD. Electronic address: sarrouti.mourad@gmail.com.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Maryland</region>
</placeName>
<wicri:cityArea>Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Ouatik El Alaoui, Said" sort="Ouatik El Alaoui, Said" uniqKey="Ouatik El Alaoui S" first="Said" last="Ouatik El Alaoui">Said Ouatik El Alaoui</name>
<affiliation wicri:level="1">
<nlm:affiliation>National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco; Laboratory of Informatics and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco.</nlm:affiliation>
<country xml:lang="fr">Maroc</country>
<wicri:regionArea>National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco; Laboratory of Informatics and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez</wicri:regionArea>
<wicri:noRegion>Fez</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Artificial intelligence in medicine</title>
<idno type="eISSN">1873-2860</idno>
<imprint>
<date when="2020" type="published">2020</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms (MeSH)</term>
<term>Automation (MeSH)</term>
<term>Biomedical Technology (methods)</term>
<term>Humans (MeSH)</term>
<term>Information Storage and Retrieval (MeSH)</term>
<term>Machine Learning (MeSH)</term>
<term>Medical Informatics (methods)</term>
<term>Natural Language Processing (MeSH)</term>
<term>PubMed (MeSH)</term>
<term>Unified Medical Language System (MeSH)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes (MeSH)</term>
<term>Apprentissage machine (MeSH)</term>
<term>Automatisation (MeSH)</term>
<term>Humains (MeSH)</term>
<term>Informatique médicale (méthodes)</term>
<term>Mémorisation et recherche des informations (MeSH)</term>
<term>PubMed (MeSH)</term>
<term>Technologie biomédicale (méthodes)</term>
<term>Traitement du langage naturel (MeSH)</term>
<term>Unified medical language system (USA) (MeSH)</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Biomedical Technology</term>
<term>Medical Informatics</term>
</keywords>
<keywords scheme="MESH" qualifier="méthodes" xml:lang="fr">
<term>Informatique médicale</term>
<term>Technologie biomédicale</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Automation</term>
<term>Humans</term>
<term>Information Storage and Retrieval</term>
<term>Machine Learning</term>
<term>Natural Language Processing</term>
<term>PubMed</term>
<term>Unified Medical Language System</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Apprentissage machine</term>
<term>Automatisation</term>
<term>Humains</term>
<term>Mémorisation et recherche des informations</term>
<term>PubMed</term>
<term>Traitement du langage naturel</term>
<term>Unified medical language system (USA)</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>
<b>BACKGROUND AND OBJECTIVE</b>
</p>
<p>Question answering (QA), the identification of short accurate answers to users questions written in natural language expressions, is a longstanding issue widely studied over the last decades in the open-domain. However, it still remains a real challenge in the biomedical domain as the most of the existing systems support a limited amount of question and answer types as well as still require further efforts in order to improve their performance in terms of precision for the supported questions. Here, we present a semantic biomedical QA system named SemBioNLQA which has the ability to handle the kinds of yes/no, factoid, list, and summary natural language questions.</p>
</div>
<div type="abstract" xml:lang="en">
<p>
<b>METHODS</b>
</p>
<p>This paper describes the system architecture and an evaluation of the developed end-to-end biomedical QA system named SemBioNLQA, which consists of question classification, document retrieval, passage retrieval and answer extraction modules. It takes natural language questions as input, and outputs both short precise answers and summaries as results. The SemBioNLQA system, dealing with four types of questions, is based on (1) handcrafted lexico-syntactic patterns and a machine learning algorithm for question classification, (2) PubMed search engine and UMLS similarity for document retrieval, (3) the BM25 model, stemmed words and UMLS concepts for passage retrieval, and (4) UMLS metathesaurus, BioPortal synonyms, sentiment analysis and term frequency metric for answer extraction.</p>
</div>
<div type="abstract" xml:lang="en">
<p>
<b>RESULTS AND CONCLUSION</b>
</p>
<p>Compared with the current state-of-the-art biomedical QA systems, SemBioNLQA, a fully automated system, has the potential to deal with a large amount of question and answer types. SemBioNLQA retrieves quickly users' information needs by returning exact answers (e.g., "yes", "no", a biomedical entity name, etc.) and ideal answers (i.e., paragraph-sized summaries of relevant information) for yes/no, factoid and list questions, whereas it provides only the ideal answers for summary questions. Moreover, experimental evaluations performed on biomedical questions and answers provided by the BioASQ challenge especially in 2015, 2016 and 2017 (as part of our participation), show that SemBioNLQA achieves good performances compared with the most current state-of-the-art systems and allows a practical and competitive alternative to help information seekers find exact and ideal answers to their biomedical questions. The SemBioNLQA source code is publicly available at https://github.com/sarrouti/sembionlqa.</p>
</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Maroc</li>
<li>États-Unis</li>
</country>
<region>
<li>Maryland</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Maryland">
<name sortKey="Sarrouti, Mourad" sort="Sarrouti, Mourad" uniqKey="Sarrouti M" first="Mourad" last="Sarrouti">Mourad Sarrouti</name>
</region>
</country>
<country name="Maroc">
<noRegion>
<name sortKey="Ouatik El Alaoui, Said" sort="Ouatik El Alaoui, Said" uniqKey="Ouatik El Alaoui S" first="Said" last="Ouatik El Alaoui">Said Ouatik El Alaoui</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sante/explor/MaghrebDataLibMedV2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000645 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000645 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sante
   |area=    MaghrebDataLibMedV2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:31980104
   |texte=   SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:31980104" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MaghrebDataLibMedV2 

Wicri

This area was generated with Dilib version V0.6.38.
Data generation: Wed Jun 30 18:27:05 2021. Site generation: Wed Jun 30 18:34:21 2021